Discourse Segmentation of German Written Texts

نویسندگان

  • Harald Lüngen
  • Csilla Puskás
  • Maja Bärenfänger
  • Mirco Hilbert
  • Henning Lobin
چکیده

Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmentation principles for English minimal units, while considering punctuation, morphology, sytax, and aspects of the logical document structure of a complex text type, namely scientific articles. The algorithm and implementation of a discourse segmenter based on these principles is presented, as well an evaluation of test runs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discourse Segmentation of German Texts

This paper addresses the problem of segmenting German texts into minimal discourse units, as they are needed, for example, in RST-based discourse parsing. We discuss relevant variants of the problem, introduce the design of our annotation guidelines, and provide the results of an extensive interannotator agreement study of the corpus. Afterwards, we report on our experiments with three automati...

متن کامل

Subtopic annotation and automatic segmentation for news texts in Brazilian Portuguese

Subtopic segmentation aims to break documents into subtopical text passages, which develop a main topic in a text. Being capable of automatically detecting subtopics is very useful for several Natural Language Processing applications. For instance, in automatic summarisation, having the subtopics at hand enables the production of summaries with good subtopic coverage. Given the usefulness of su...

متن کامل

Coreference in Spoken vs. Written Texts: a Corpus-based Analysis

This paper describes an empirical study of coreference in spoken vs. written text. We focus on the comparison of two particular text types, interviews and popular science texts, as instances of spoken and written texts since they display quite different discourse structures. We believe in fact, that the correlation of difficulties in coreference resolution and varying discourse structures requi...

متن کامل

A Rule Based Approach to Discourse Parsing

In this paper we present an overview of recent developments in discourse theory and parsing under the Linguistic Discourse Model (LDM) framework, a semantic theory of discourse structure. We give a novel approach to the problem of discourse segmentation based on discourse semantics and sketch a limited but robust approach to symbolic discourse parsing based on syntactic, semantic and lexical ru...

متن کامل

Tense, Modality and Polarity: The Finite Verbal Group in English and German Newsgroup Texts

This paper describes work in progress on a corpus-based study, comparing seemingly similar registers in two languages: English and German newsgroup texts, collected in the Bremen Translation Corpus. Systemic Functional Grammar (SFG, Halliday 1994 [1985]) provides a theoretical framework for categorizing empirical findings. I will focus on three systems of the finite verbal group, i.e. tense, mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006